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Abstract 



The mortality related to cervical cancer can be substantially reduced through early 
detection and treatment. However, current detection techniques, such as Pap smear 
and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo 
fluorescence spectroscopy is a technique which quickly, non-invasively and quantitatively 
probes the biochemical and morphological changes that occur in pre-cancerous tissue. 
A multivariate statistical algorithm was used to extract clinically useful information 
from tissue spectra acquired from 361 cervical sites from 95 patients at 337, 380 and 
460 nm excitation wavelengths. The multivariate statistical analysis was also employed 
to reduce the number of fluorescence excitation-emission wavelength pairs required to 
discriminate healthy tissue samples from pre-cancerous tissue samples. The use of 
connectionist methods such as multi layered perceptrons, radial basis function networks, 
and ensembles of such networks was investigated. RBF ensemble algorithms based on 
fluorescence spectra potentially provide automated, and near real-time implementation 
of pre-cancer detection in the hands of non-experts. The results are more reliable, direct 
and accurate than those achieved by either human experts or multivariate statistical 
algorithms. 



1 Introduction 



Cervical carcinoma is the second most common cancer in women worldwide, exceeded only 
by breast cancer ( American Cancer Society, 1995| ). The mortality related to cervical cancer 
can be reduced if this disease is detected at its pre-cancerous state, known as squamous 
intraepithelial lesion (SIL) ( |Wright et al., 1994| ). Even though widespread use of organized 
screening (Pap smear) and diagnostic (colposcopy) programs are currently in place, approx- 
imately 15,900 new cases of cervical cancer and 4,900 cervical cancer related deaths were 
reported in the United States alone, in 1995 ( American Cancer Society, 1995 ). 
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Currently, the primary screening tool for the detection of cervical cancer and its pre- 
cursor is the Pap smear ( [Kurman et al., 1994 ). In a Pap test, a large number of cells 
obtained by scraping the cervical epithelium are smeared onto a slide which is then fixed 
and stained for cytologic examination. Each smear is then examined under a microscope 
for the presence of neoplastic cells ( |World Health Organization, 1988| ). The Pap smear is 
unable to achieve a concurrently high sensitivity]] and high specificity^ ( |Fahey et al., 1995| ) . 
The accuracy of the Pap smear is limited by both sampling and reading errors ( Wilkinson] 
1990| ) . Approximately 60% of false-negative smears are attributed to insufficient sampling; 
the remaining 40% are due to reading errors. Because of the monotony and fatigue asso- 
ciated with reading Pap smears (50,000-300,000 cells per slide), the American Society of 
Cytology has proposed that a cyto-technologist should be limited to evaluating no more 



than 12,000 smears annually ( Koss, 1989 ). As a result, accurate Pap smear screening is 
labor intensive and requires highly trained professionals. Some new tools (Thinkprep, Pap- 
net, autopap) to assist cyto-technologists have recently been introduced, but they are all 
based on invasive techniques ( Korman, 1996| ), 



A patient with a Pap smear interpreted as indicating the presence of SIL is followed up 



by a diagnostic procedure called colposcopy ( Kurman et al., 1994 ). During a colposcopic 
examination, the cervix is stained with acetic acid and viewed through a low power micro- 
scope to identify potential pre-cancerous sites. Subsequently, suspicious sites are biopsied 
and then histologically examined to confirm the presence, extent and severity of the le- 
sion ( Burke and Ducatman, 1991| ). Colposcopic examination in expert hands maintains a 
high sensitivity at the expense of a significantly low specificity, leading to many unnecessary 



biopsies ( [Mitchell, 1994|) . In spite of the poor specificity of this technique, extensive train- 
ing is required to achieve this skill level. Furthermore, since this procedure involves biopsy, 



which requires histologic evaluation, diagnosis is not immediate (Kurman et al., 1994). 



Laser induced fluorescence spectroscopy is an optical technique which has the capabil- 
ity to quickly, non-invasively and quantitatively probe the biochemical and morphological 
changes that occur as tissue becomes neoplastic. The altered biochemical and morphologi- 
cal state of the neoplastic tissue is reflected in the spectral characteristics of the measured 
fluorescence. This spectral information can be correlated to tissue histo-pathology, the 
current "gold standard" to develop clinically effective screening algorithms. These mathe- 
matical algorithms can be implemented in software, potentially enabling automated, fast, 
non-invasive and accurate pre-cancer detection in hands of non-experts. Although a com- 
plete understanding of the quantitative information contained within a tissue fluorescence 
spectrum has not been achieved, many groups have investigated the use of fluorescence spec- 



troscopy for real-time, non-invasive, automated characterization of tissue pathology ( 


Brai- 


chotte et al., 1995; Cothren et al., 1990|; Lohmann et al., 198£; 


Marchesini et al., 1992; 




Richards-Kortum et al., 1991; 


Schomacker et al., 1992; Yuanlong et al., 1987]). 



Sensitivity is the correct classification percentage on the pre-cancerous tissue samples. 
2 Specificity is the correct classification percentage on normal tissue samples. 
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A detection technique for human cervical pre-cancer based on laser induced fluorescence 
spectroscopy has been developed recently ( Ramanujam et ah, 1996 ). Discrimination was 
achieved using a multivariate statistical algorithm (MSA) based on Principal Component 
Analysis (PCA) and Logistic Discrimination of tissue spectra acquired in vivo. This linear 
method of algorithm development demonstrated an improved classification accuracy rela- 
tive to both the Pap smear and colposcopy in expert hands. In this article, we investigate 
neural network based non- linear methods for algorithm development, and compare them 
to both the MSA and conventional clinical methods. Specifically, we investigate the per- 
formance of Multi-Layered Perceptron (MLP) and Radial Basis Function (RBF) networks, 
and ensembles of these networks, on cervical tissue fluorescence spectra. The connectionist 
methods aim at improving the classification accuracy and reliability of the MSA, as well as 
simplifying the decision making process. Section |2| presents the data collection/processing 
techniques. In Section ||, the MSA, and the neural network based methods are described. 
Section ^| contains the results of our analysis and compares the neural network results to 
that of the MSA and current clinical detection methods. A discussion of the results is given 
in Section |H[ 



2 Data Collection and Processing 

2.1 Instrumentation and Clinical Measurements 

A portable fluorimeter consisting of two nitrogen pumped-dye lasers, a fiber-optic probe 
and a polychromator coupled to an intensified diode array controlled by an optical multi- 
channel analyzer was utilized to measure fluorescence spectra from the cervix in vivo at three 
excitation wavelengths: 337, 380 and 460 nm ( Ramanujam et al., 1996] ) . Data acquisition, 



calibration and processing have been described in detail elsewhere ( [Ramanujam et al., 1996| ). 

A randomly selected group of non-pregnant patients referred to the colposcopy clinic 
of the University of Texas MD Anderson Cancer Center on the basis of abnormal cervical 
cytology was asked to participate in the in vivo fluorescence spectroscopy study. Informed 
consent was obtained from each patient who participated and the study was reviewed and 
approved by the Institutional Review Boards of the University of Texas, Austin and the 
University of Texas, MD Anderson Cancer Center. 

Each patient underwent a complete history and a physical examination including a 
pelvic exam, a Pap smear and colposcopy of the cervix, vagina and vulva. After colposcopic 
examination of the cervix, but before tissue biopsy, fluorescence spectra were acquired on 
average from two colposcopically abnormal sites, two colposcopically normal squamous sites 
and 1 normal columnar site (if colposcopically visible) from each patient, from a total of 
361 cervical sites in 95 patients. 

Tissue biopsies were obtained only from abnormal sites identified by colposcopy and 
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Table 1: Histo-pathologic classification of samples from the training and test sets. Note, 
biopsies for histological evaluation were not obtained from colposcopically normal squamous 
and columnar tissue sites to comply with routine patient care procedure. 



Histo-pathology Target Classification 


Training Set 


Test Set 


Normal Squamous 




94 


94 


Normal Columnar 


non-SIL 


13 


14 


Inflammation 




15 


14 


LG SIL 


SIL 


23 


24 


HG SIL 




35 


35 



subsequently analyzed by the probe to comply with routine patient care procedure. All 
tissue biopsies were fixed in formalin and submitted for histologic examination. Hemo- 
toxylin and eosin stained sections of each biopsy specimen were evaluated by a panel of four 
board certified pathologists and a consensus diagnosis was established using the Bethesda 
classification system (Wright et al., 1994). This classification system which has previously 
been used to grade cytologic specimens has now been extended to classification of histol- 
ogy samples. Samples were classified as normal squamous (NS), normal columnar (NC), 
inflammation^], low grade (LG) SIL and high grade (HG) SIL. Table [l] provides the number 
of samples in the training (calibration) and test (prediction) sets. 



2.2 Spectral Data 

Figure [j] illustrates average fluorescence spectra per site acquired from cervical sites at 
337 nm excitation from a typical patient. All fluorescence intensities are reported in the 
same set of calibrated units. Evaluation of the spectra at 337 nm excitation indicates 
that the fluorescence intensity of SILs (LG and HG) is less than that of the corresponding 
normal squamous tissue; however, their fluorescence intensity is greater than that of the 
corresponding normal columnar tissue over the entire emission spectrum. 

Figure |2| illustrates average fluorescence spectra per site acquired from cervical sites 
at 380 nm excitation, from the same patient. In Figure ^, the fluorescence intensity of 
SILs is less than that of the corresponding normal squamous tissue, with the LG SIL 
exhibiting the weakest fluorescence intensity over the entire emission spectrum. Note that 
the fluorescence intensity of the normal columnar sample is indistinguishable from that 
of the HG SIL. Figure [3| illustrates spectra at 460 nm excitation from the same patient. 
Evaluation of Figure |3| indicates that the fluorescence intensity of SILs is less than that 

3 In this paper we will not focus on the classification of tissues with inflammation. Evaluation of these 
tissues using both MSA and neural networks indicates that they are nearly indistinguishable from SILs based 



on the spectral data presented here (Ramanujam et al., 1995a; Ramanujam et al., 1995b). To remedy the 



situation, different optical spectroscopic techniques are needed. 
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Figure 2: Fluorecsence spectra from a typical patient at 380 nm excitation. 



of the corresponding normal squamous tissue and greater than that of the corresponding 
normal columnar sample over the entire emission spectrum. 

Tissue fluorescence spectra at 337 nm excitation consists of intensities at a total of 59 
emission wavelengths; tissue spectra at 380 nm excitation consists of intensities at a total of 
56 emission wavelengths and that at 460 nm excitation consists of intensities at 45 emission 
wavelengths. Hence, fluorescence spectra at all three excitation wavelengths comprise of 
fluorescence intensities at a total of 160 excitation-emission wavelengths pairs. 
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3 Algorithm Development 



In this section, the development of the multivariate statistical algorithm and the neural 
network based algorithms are described. Each type of algorithm was utilized to develop 
a detection method that can effectively discriminate between SILs and non-SILs (normal 
squamous and normal columnar). 



3.1 Multivariate Statistical Algorithms 

3.1.1 Full-Parameter Multivariate Statistical Algorithm 



The Multivariate Statistical Algorithm (MSA) development described in ( Ramanujam et al. 



1996|) consists of the following five steps: 



1 . pre-processing to reduce inter-patient and intra-patient variation of fluorescence spec- 
tra from a tissue type, using either normalization or normalization followed by mean- 
scaling, 

2. dimension reduction of the pre-processed tissue fluorescence spectra using Principal 
Component Analysis (PC A), 

3. selection of diagnostically relevant principal components, using an unpaired, one-sided 
student's t-test, 

4. development of a classification algorithm based on logistic discrimination, using the 
diagnostically relevant principal components, and finally 

5. retrospective and prospective evaluation of the algorithm's accuracy on a training and 
test set, respectively. 
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Figure 4: Schematic representation of the composite MSA algorithm. 
This process of algorithm development was applied to tissue fluorescence spectra ac- 



quired at all three excitation wavelengths, as described in detail in (Ramanujam et al 
1996| ). Discrimination between the SILs and the two normal tissue types was achieved 
using a composite algorithm of two independently developed constituent algorithms. Con- 
stituent algorithm (1), which is based on tissue spectra that have been pre-processed by 
normalization, discriminates between SILs and normal squamous tissue samples. Con- 
stituent algorithm (2), which is based on tissue spectra that have been pre-processed by 
normalization followed by mean-scaling, discriminate between SILs and normal columnar 
tissue samples. The classification outputs from both constituent algorithms were used to 
determine whether a sample being evaluated is SIL/non-SIL. A sample is first presented to 
constituent algorithm (1). If it is classified as non-SIL, the algorithm terminates. If it is 
classified as SIL, then the sample is presented to constituent algorithm (2), and the result of 
that algorithm determines the final classification of the tissue sample. Figure ||] illustrates 
this procedure. 



3.1.2 Reduced- Parameter Multivariate Statistical Algorithms 

The full-parameter MSA utilizes fluorescence spectra at all three excitation wavelengths to 
develop a classification scheme for cervical pre-cancer detection; the fluorescence spectra at 
these three excitation wavelengths correspond to fluorescence intensities at a total of 160 
excitation-emission wavelength pairs (5 nm spectral resolution). However, there is a signif- 
icant cost penalty for using all 160 values. To alleviate this concern, a more cost-effective 
fluorescence imaging system can be developed if the number of required excitation-emission 
wavelength pairs at which fluorescence intensities need to be recorded is significantly re- 
duced. For example, if the number of required excitation-emission wavelength pairs that 
need to measured can be reduced by an order of magnitude, the polychromator and in- 
tensified diode array can be replaced by a mechanical filter assembly and a single channel 
detector. The resulting system represents a substantial decrease in cost and complexity of 
this instrumentation. 
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We have shown that component loadings calculated from Principal Component Anal- 
ysis ( Ramanujam et al., 1996| ) can be used to reduce the number of fluorescence excitation- 
emission wavelength pairs required to generate the constituent algorithms from 160 to 13, 
with a minimal decrease in classification accuracy. The component loadings represent the 
correlation between a principal component and the pre-processed fluorescence spectra at 
each excitation-emission wavelength pair. More precisely, the diagnostically relevant prin- 
cipal components of the full-parameter set (160 variables) selected using the students t-test 
are used to calculate the component loadings. The intensity /wavelength pairs that show 
a strong correlation to the relevant principal components form the reduced-parameter set. 
Table ^ shows the fluorescence intensities at the reduced number of excitation-emission 
wavelength pairs. These pairs are used to redevelop constituent algorithms (1) and (2) 
using the MSA process. 



Table 2: Fluorescence intensities at 13 excitation-emission wavelength pairs needed to re- 
develop the two reduced-parameter constituent algorithms. Aside from the pre-processing, 
the two sets only differed in two selections, identified by *. 



Features for Algorithm (1) 


Features for Algorithm (2) 


(Normalized) 


(Normalized and Mean Scaled) 






337, 


410 nm 


337, 410 nm 


337, 


430 nm 


337, 430 nm 


337, 


510 nm 


337, 510 nm 


337, 


580 nm 


337, 580 nm 


380, 


410 nm 


380, 410 nm 


380, 


430 nm 


380, 430 nm 


380, 


510 nm 


380, 510 nm 


380, 


580 nm 


380, 580 nm 


380, 


640 nm 


380, 600 nm * 


460, 


580 nm 


460, 580 nm 


460, 


600 nm 


460, 600 nm 


460, 


620 nm 


460, 620 nm 


460, 


640 nm 


460, 660 nm * 



3.2 Algorithms Based on Neural Networks 



The second stage of algorithm development consists of evaluating the applicability of neu- 
ral networks to this problem. For this study, we consider two types of feed forward neural 
networks, the Multi-Layered Perceptron (MLP) and the Radial Basis function (RBF) net- 
work (Turner et al., 1997). 
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3.2.1 Multi-Layered Perceptrons 



The MLP, probably the most commonly used neural network architecture, is a feed-forward 
neural network with an input layer, an output layer and possibly multiple hidden layers. 
Each layer is connected only to the subsequent layer by variable weights which are adjusted 
to minimize a predetermined cost function, such as the Mean Square Error (MSE). For an 
MLP with one hidden layer, the responses of the kth hidden unit, hk, and the jth output, 
Oj, are respectively given by: 

hk = gC^VkiXi); (1) 

i 

and 

°j = fC^2 w jkhk), (2) 

k 

where the input to hidden connections strength are given by v, the hidden to output con- 
nection strengths are given by w, and indices i and k sum over the input (x) and hidden 
units respectively. The activation functions /(•) and </(■) are sigmoidal functions. 

In order to adapt the weights, the backpropagation algorithm is generally used. The 
principle of this algorithm is based on distributing the "blame" or the contribution of each 
unit to the overall error, across the proper weights. Further details can be found in (Hay kin 



1994 ; Bishop, 1995| ). In this study we only explore MLPs with a single hidden layer. 



3.2.2 Radial Basis Function Networks 

Radial Basis Function networks are feed-forward networks with a single hidden layer where 
the activation function is a radially symmetric basis function. The output units perform 
a weighted sum over the outputs of the hidden units (also called kernels). One class of 
radial basis functions that is of particular interest consists of those with Gaussian kernels, 



or where the basis', (Rj(x)), have the following activation function (Haykin, 1994): 



1 l|x-Xj|| 2 

Rj(x) = e (3) 

where cr, determines the width of the receptive field, and xj determines the centroid of 
the jth kernel, respectively. The jth hidden node has a maximum output of 1 when input 
x = xj. Important parameters in the design of RBF networks include the number, location 
and receptive field widths of the kernels. 



3.2.3 Combining Multiple Networks 

The performance of a given MLP or RBF network depends on many parameters, including 
size, learning rate, training strategy and initial weights. These differences result in different 
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classification decisions, making the selection of a single "best" network a delicate matter. 
This problem is further compounded when the amount of training patterns is limited, be- 
cause the definition of "best" depends on the particular validation set chosen. In such cases, 
it is difficult to ascertain whether one network will outperform all others given a different 
test set, as the validation sets are small. Furthermore, selecting only one classifier discards 
a large amount of potentially relevant information. In order to use all the available data, 
and to increase both the performance and the reliability of the methods, one can pool the 



outputs of the individual classifiers before a classification decision is made (Hampshire and 



Waibel, 1992| ; Hansen and Salamon, 199C ; Perrone and Cooper, 1993 ; Turner and Ghosh 



1996a| ; |Wolpert, 1991) . 



In this study we use the median combiner, f™ , which belongs to the class of order 
statistics combiners introduced in ( [Tumer and Ghosh, 1995| ), and the averaging combiner, 
jave w hi c ] 1 performs an arithmetic average of the corresponding outputs: 

f fr N ( X )+f? +1:N ( X ) N . 
fr\x) = ^ 2 even (4) 

[ f i 2 (a?) if N is odd, 

and 

1 N 

frw = E (5) 

m=l 

where ./V represents the number of available classifiers, and /|"(a;) the ith. output of the mth 
classifier for input x. We selected these two combiners, because of their simplicity in both 
interpretation and implementation, and because they typically result in good and robust 
performance ( Kittler et al., 1996| ; Tumer and Ghosh, 1995| ). 



4 Results 

In this section we discuss the application of various neural network based algorithms to the 
spectral data discussed in Section ||. In order to establish the validity of using the neural 
network ensembles to perform this task, we conducted the following sets of experiments: 

1. To evaluate the suitability of the neural network methods, we re-developed constituent 
algorithms (1) and (2) with the RBF and MLP ensembles using: 

(a) the full-parameter, pre-processed data sets; 

(b) the diagnostically relevant principal components obtained from the full-parameter, 
pre-processed data sets; 

(c) the reduced-parameter, pre-processed data sets. 



10 



2. Reduce the two-step algorithm developed using MSA into a single-step algorithm 
using RBF and/or MLP ensembles. 

3. Produce full comparative results showing the neural network ensembles' classification 
accuracy and reliability relative to the MSA, Pap smear and colposcopy 

4.1 Neural Network Ensembles on Full-Parameter Set 

The first step in applying the neural network ensembles to this problem consisted of de- 
termining whether the algorithms were suited for this task. To that end, both the MLP 
and RBF ensembles were used to separate the normal squamous tissues from the SILs 
(constituent algorithm (1)). The task proved to be impossible using the full 160 param- 
eters mainly because of the small number of training samples. Constraining the number 
of weights required to handle 160 parameters with the available data resulted in a highly 
ill-posed problem QWahba, 1982; ) . As a result the network tried to memorize the training 
data and performed poorly on the test set. In order to avoid this pitfall, we used the three 
diagnostically relevant principal components from the full-parameter data set containing 
normalized fluorescence spectra. (Note, the MSA was developed using these same three 
PCs, because it also was unable to solve this problem using the full-parameter data.) Both 
networks had two outputs, each representing the posterior probabilities of the corresponding 
class. The MLP networks had a single hidden layer with three hidden units, and the RBF 
networks had three kernels which were initialized by a /c-means algorithm on the training 
setft 

The ensemble results reported in Table || correspond to the pooling of 20 different 
classifiers (MLPs or RBFs), each of which started from a different random weight initial- 
izations, using the average and median combiners, respectively^. All the results reported in 
this article are based on test set performance. We report the sensitivity and specificity val- 
ues separately, rather than a single classification rate, to emphasize the difference between 
a false-positive and a false- negative. For an application such as pre-cancer detection, the 
cost of a misclassification varies greatly from one class to another. Erroneously labeling a 
healthy tissue as pre-cancerous can be corrected when further tests are performed. Labeling 
a pre-cancerous tissue as healthy however, can lead to disastrous consequences. Therefore, 
for algorithm (1), we increased the cost of a misclassified SIL until the sensitivity^ reached 
a satisfactory level (comparable to the sensitivity of current clinical detection). 

For this experiment, the RBF based combiners provide higher specificity than either 
the MLP ensembles or the MSA, for a similar sensitivity. This experiment was conducted 

4 The appropriate sizes of both the MLP and the RBF network were determined experimentally. Because 
we found that the performance was comparable over fairly large range of network sizes, it was not necessary 
to use sophisticated methods. 

5 The gains due to combining were minimal if more classifiers were used. 

6 In the results reported in this section, the cost of misclassifying a SIL was two times the cost of misclas- 
sifying a normal tissue sample. 
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to ensure that neural network ensembles could duplicate the MSA results ( [Ramanujam 
et al., 1996| ), using the principal components extracted from the full-parameter set. 

Table 3: Accuracy of constituent algorithm (1) for differentiating between SILs and normal 
squamous tissues, based on the diagnostically relevant principal components of the full- 
parameter set containing normalized spectra. All results are based on test test performance. 



Algorithm 


Specificity Sensitivity 


MSA 
RBF-single 
MLP-single 


68% 88% 
71% ±3.8% 86% ±2.1% 
65% ±0.0% 83% ±1.1% 


RBF-ave 
RBF-med 


74% ±0.7% 86% ±0.5% 
72% ±1.7% 86% ±1.5% 


MLP-ave 
MLP-med 


65% ±0.0% 83% ±0.0% 
65% ±0.0% 84% ±0.7% 



4.2 Neural Networks on Reduced-Parameter Set (2-step Algorithm) 

The second step in applying neural networks to this problem consisted of determining 
whether the neural network ensemble performance on the reduced-parameter data set was 
acceptable, by retracing the development of the two-step process outlined for the multi- 
variate statistical algorithm. Specifically, the neural network ensemble was applied to the 
pre-processed reduced-parameter data sets, used to develop constituent algorithms (1) and 
(2). 

For constituent algorithm (1) the RBF kernels were initialized using a k- means cluster- 
ing algorithm on the training set containing normal squamous tissue samples and SILs. The 
RBF networks had 10 kernels, whose locations and spreads were adjusted during training. 
The MLP had one hidden layer with 10 hidden units. For constituent algorithm (2), we 
selected 10 kernels, half of which were fixed to patterns from the columnar normal class, 
while the other half were initialized using a fc-means algorithm. Neither the kernel loca- 
tions nor their spreads were adjusted during training. This process was adopted to rectify 
the large discrepancy between the number of samples from each category (13 for columnar 
normal vs. 58 for SILs). In this case, the MLP algorithm had 10 hidden units. Training 
was shypothesizetopped when the performance on the training set slowed down sufficiently 
to suggest further training would cause overtraining^ 

Once a stopping time was selected, 20 cases were run for each algorithm^]. 

7 In general, Leave-One-Out or K-fold cross validation woudl be more desirable. However, because the 
results were similar over a large window of stopping times, this more naive method seemed adequate. 
8 Each run has a different initialial set of kernels/spreads/wcights. 
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Table 4: Accuracy of constituent algorithm (1) for differentiating SILs and normal squamous 
tissues, based on the reduced-parameter set containing normalized spectra. 



Algorithm 


Specificity Sensitivity 


MSA 
RBF-single 
MLP-single 


63% 90% 
66% ±2.7% 84% ±2.0% 
61% ±0.0% 91% ±0.2% 


RBF-ave 
RBF-med 


66% ±0.6% 90% ±0.0% 
66% ±1.1% 90% ±0.7% 


MLP-ave 
MLP-med 


61% ±0.0% 91% ±0.0% 
61% ±0.0% 91% ±0.0% 



Table 5: Accuracy of constituent algorithm (2) for differentiating SILs and normal columnar 
tissues, based on the reduced-parameter set containing normalized and mean-scaled spectra. 



Algorithm 


Specificity Sensitivity 


MSA 
RBF-single 
MLP-single 


36% 97% 
35% ±15% 97% ±1.6% 
47% ±6.9% 89% ±3.5% 


RBF-ave 
RBF-med 


37% ±5.0% 97% ±0.0% 
44% ±7.5% 97% ±0.0% 


MLP-ave 
MLP-med 


50% ±0.0% 88% ±0.7% 
50% ±0.0% 89% ±2.5% 



The ensemble results reported are based on the pooling of 20 different runs of RBF 
networks, initialized and trained as described in the previous section. Once again we in- 
creased the cost of misclassifying a SIL in order to increase the sensitivity at the expense 
of reducing the specificity. For algorithm (1), the cost of a misclassified SIL was 2.5 times 
the cost of a misclassified normal tissue sample. The sensitivity and specificity values for 
constituent algorithm (1) based on MSA, MLP and RBF ensembles are provided in Table ||. 
Table || presents sensitivity and specificity values for constituent algorithm (2) obtained 
from MSA, and MLP and RBF ensembles. In this case, there was no need to increase the 
cost of a misclassified SIL for the RBF network, because of the high prominence of SILs in 
the training set. For the MLPs however, the cost of normal columnars had to be increased 
in order to obtain classification decisions^]. 

For both algorithms (1) and (2), the RBF based combiners provide higher specificity 

9 When the cost of a misclassified SIL was the same as the cost of a misclassified normal columnar, all 
patterns were classified as SILs by the MLP. Only when the cost of normal columnars was increased did the 
MLP start to make non-trivial classification decisions. 
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than the MSA, while the MLP ensemble does so only for algorithm (1). Furthermore in the 
case of the RBF algorithm, this increase is achieved without a decrease in the sensitivity. 
The median combiner provides results similar to those of the average combiner, except for 
algorithm (2) where it provides better specificity. From these experiments we can conclude 
that the RBF network is better suited for this task than the MLP. While in some other 
work such as classification of sonar signals we have found combining MLPs with RBFs to 
be fruitful ( Ghosh et al., 1996| ), in this problem we were surprised to find that combining 



MLPs and RBFs always gave worse results than combining RBFs alone. We hypothesize 
that the fine tuning required to accommodate the varying number of class samples poses 
severe problems for MLPs global computations. The RBF networks significantly alleviate 
this problem by placing the kernels in the appropriate places. 

We conducted this experiment to not only to demonstrate that the reduced-parameter 
data set does not compromise the performance of the network ensembles, but also to com- 
pare the RBF and MLP ensembles, having established the validity of the reduced-parameter 
set, we performed the remaining experiments on the reduced-parameter data set only. This 
is a significant step, since it allows us to solely depend on the 13 parameters obtained 
from the component loadings, rather than on the principal components from the original 
160 values. Furthermore, the results of the first two steps indicate that the RBF network 
ensembles not only duplicate and surpass the MSA results, but also outperform the MLP 
algorithms at every step. Since the MLP ensembles fail to show any areas where they may 
be more desirable than the RBF network ensembles, we will restrict the remainder of the 
experiments to RBF network ensembles only. 



In Sections 4.1 and 4.2, we used the two constituent algorithms separately, to discrim- 
inate different types of normal tissue from SILs. In order to obtain the final discrimination 
between normal tissue and SILs, constituent algorithms (1) and (2) need to be used se- 
quentially. This two step approach, highlighted in Figure ||], was specifically developed for 
the multivariate statistical analysis, which performed best when the decision tasks were 
simplified. In the next section we present a more direct approach that uses the strengths 
of the neural network ensembles to reduce the multi-step classification scheme to a direct, 
one-step, process. 



4.3 Single Step Classification using Reduced-Parameter Set 

In this section, we examine the potential of RBF ensembles for separating SILs from normal 
tissue samples in a single classification step. Because the pre-processing for algorithms (1) 
and (2) is differential, we formed 26-dimensional inputs by concatenating the two reduced 



features sets describes in Section 3.1.2. We initialized 10 kernels using a k- means algorithm 



5 Normalization vs. normalization followed by mean scaling. 
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on a trimmedPI version of the training set. Without this trimming process, few patterns 
belonging to the columnar normal class are selected as kernels due to their low prominence 
in the training set, resulting in poor performance when such a sample is seen. During 
training, the kernel locations and spreads were not adjusted, to allow kernels to remain in 
sections of the input space where few patterns are observed. The cost of a misclassified SIL 
was set at 2.5 times the cost of a misclassified normal tissue sample, in order to provide the 
best sensitivity/specificity pair. The average and median combiner results are obtained by 
pooling 20 RBF networks^. 

Table 6: One step RBF algorithm compared to multi-step MSA. (Based on reduced- 
parameter set.) 



Algorithm 


Specificity Sensitivity 


2-step MSA 
2-step RBF-ave 
2-step RBF-med 


65% 84% 
65% ± 2% 87% ±1% 
67% ± 2% 87% ±1% 


1-step RBF-single 
1-step RBF-ave 
1-step RBF-med 


65% ± 9% 89% ±2.8% 
67% ± .7% 91% ±1.5% 
65.5% ± .5% 91% ±1% 



Table ^ shows the performance of the algorithms for a given SIL misclassification cost. 
For comparison purposes, the results of the 2-step RBF ensemble algorithms are also pro- 
vided. These algorithms perform the same steps as the MSA, using the results presented 
in Section 3,1.2j for each constituent algorithm). 

As we discussed above, for an application such as pre-cancer detection, it is far more 
critical to increase the classification accuracy of some classes than others, to eliminate 
certain types of errors. By making the algorithm more compact, the one step algorithm 
also makes this trade-off more visible. 



4.3.1 Sensitivity-Specificity Tradeoff 

In this subsection we detail the interaction between the cost of misclassification and the 
variation in the sensitivity/specificity of the RBF ensembles. Since it may be required to 
reach a predetermined sensitivity, we have varied the cost of misclassifying a SIL to obtain 
a wide range of sensitivity/specificity pairs. Table [7| shows the specificity and sensitivity for 
various costs of a misclassification. 

On observing the performance of the RBF ensembles at various costs of misclassifying 

n The trimmed set has the same number of patterns from each class. Thus, it forces each class to have a 
similar number of kernels. This set is used only for initializing the kernels, not while training. 
12 This procedure is repeated 10 times, in order to determine the variability in the ensembles. 
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Table 7: Effect of misclassification cost on 1-step RBF algorithm. 



Algorithm 


Cost of SIL 
misclassification 


Specificity 


Sensitivity 


2-step MSA 




65% 


84% 


RBF-ave 


1 


85% 


61% 


RBF-med 




84% 


61% 


RBF-ave 


2 


75% 


88% 


RBF-med 




74% 


86% 


RBF-ave 


2.5 


67% 


91% 


RBF-med 




66% 


91% 


RBF-ave 


3 


63% 


93% 


RBF-med 




59% 


93% 


RBF-ave 


4 


55% 


95% 


RBF-med 




52% 


97% 


RBF-ave 


5 


39% 


97% 


RBF-med 




37% 


97% 



SILs, the improvements over the two-step MSA algorithm are apparent. If the specificity is 
required to be above 60%, for example, using a SIL misclassification cost of three improves 
the sensitivity of the MSA by a significant 10%, using the average RBF ensembles. If, on 
the other hand, a sensitivity above 83% is required, using a SIL misclassification cost of 
two, provides an improvement of 12% over the specificity of the MSA. 

4.4 Comparative Results 

In the previous sections we discussed how the neural network ensembles were applied to 
various stages of this classification tasks. In this section we compare the final results of 
the RBF network ensembles to the best MSA result and to clinical methods. The SIL 
misclassification cost of 2.5 provided the best compromise between sensitivity and specificity. 
To find the variability of the methods, we performed the ensemble averaging 10 times on 
20 different individual classifiers. 

The results of both the two-step and one-step RBF algorithms and the results of the 
two-step MSA are compared to the accuracy of Pap smear screening and colposcopy in 
expert hands in Table || A comparison of one-step RBF algorithms to the two-step RBF 
algorithms indicates that the one-step algorithms have similar specificities, but a moderate 
improvement in sensitivity relative to the two-step algorithms. Compared to the MSA, the 
one-step RBF algorithms have a slightly decreased specificity, but a substantially improved 
sensitivity. In addition to the improved sensitivity, the one step RBF algorithms simplify 



16 



the decision making process. A comparison between the one step RBF algorithms and 
Pap smear screening indicates that the RBF algorithms have a nearly 30% improvement 
in sensitivity with no compromise in specificity; when compared to colposcopy in expert 
hands ( Fahey et al., 1995| ), the RBF ensemble algorithms maintain the sensitivity of expert 
colposcopists, while improving the specificity by almost 20%. 



Table 8: One step RBF algorithm compared to multi-step MSA ( [Ramanujam et al., 1996 ) 



and clinical methods (Fahey et al., 1995) for differentiating SILs and normal tissue samples. 



Algorithm 


Specificity Sensitivity 


2-step MSA 
RBF-single 
RBF-ave 
RBF-med 


65% 84% 
65% ± 9% 89% ±2.8% 
67% ±.75% 91% ±1.5% 
65.5% ±.5% 91% ±1% 


Pap smear (human expert) 
Colposcopy (human expert) 


68% ±21% 62% ±23% 
48%±23 % 94% ±6% 



Figure || further illustrates the trade-off between specificity and sensitivity for clinical 
methods, MSA and RBF ensembles, obtained by changing the misclassification cost. The 
one-step RBF ensembles provide better sensitivity and higher reliability than any other 
method for a given specificity value. 



5 Discussion 

In this article we demonstrate that cervical tissue fluorescence spectra can be used to develop 
detection algorithms that differentiate SILs from normal tissue samples. Of the various al- 
gorithms explored, the RBF network ensemble proved to be the best alternative, surpassing 
single networks, MLP ensembles, and the multivariate statistical algorithm. 

The classification results of both the multivariate statistical algorithms and the radial 
basis function network ensembles demonstrate that significant improvement in classification 
accuracy can be achieved over current clinical detection modalities using cervical tissue 
spectral data obtained from in vivo fluorescence spectroscopy. The one-step RBF algorithm 
has the potential to significantly reduce the number of pre-cancerous cases missed by Pap 
smear screening and the number of normal tissues misdiagnosed by expert colposcopists. 

The qualitative nature of current clinical detection modalities leads to a significant vari- 
ability in classification accuracy. For example, estimates of the sensitivity and specificity of 



Pap smear screening have been shown to range from 11-99% and 14-97%, respectively ( Fahey 



et al., 1995j ). This limitation can be addressed by exploiting the variance reducing prop- 



erties of an ensemble approach. In particular, RBF ensembles demonstrate a significantly 
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Figure 5: Trade-off between sensitivity and specifity for MSA flRamanujam et al., 1996 ) 
and RBF ensembles. For reference, Pap smear and colposcopy results from the literature 



on various data sets are included (Fahey et al., 1995). 



smaller variability in classification accuracy, therefore enabling more reliable classification. 
In addition to demonstrating a superior sensitivity, the RBF ensembles simplify the decision 
making process of the two-step algorithms based on RBF and MSA into a single step that 
discriminates between SILs and normal tissues. We note that for the given data set, both 
MSA and MLP were unable to provide satisfactory solutions in one step. 

The one-step algorithm development process can be readily implemented in software, 
enabling automated detection of cervical pre-cancer. It can potentially provide near real 
time implementation of pre-cancer detection in the hands of non-experts, and could lead to 
wide-scale implementation of screening and diagnosis, and more effective patient manage- 
ment in the prevention of cervical cancer. The success of this application will represent an 
important step forward in both medical laser spectroscopy and gynecologic oncology. 

Acknoledgements: This research was supported in part by NSF grant ECS 9307632, 
AFOSR contract F49620-93-1-0307, and Lifespex, Inc. 
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